Follow us on twitter!
We’re going to try to cover everything except Model.
This image is from R for Data Science, a great text to get started with. It’s available online as a free ebook.
To familiarise ourselves with R, we will do what R users do.
We will explore a dataset.
A fitting dataset.
A few images to explain why a dataset about witch trials might be appropriate for a workshop hosted by an advocacy group for underrepresented genders.
Group discussion
What would you like to know?
Write questions on whiteboard
For this workshop:
“The plane is pretty boring without the airport around it.”
(Tip of the hat to Julia Lowndes for the aeroplane analogy.)
The installation instructions adapted with appreciation from a previous workshop.
Go to the Comprehensive R Archive Network(CRAN) website.
It was first in a google search for ‘cran’ in June 2018.
Go to the RStudio website.
It was first in a google search for ‘rstudio’ in June 2018.
Choose RStudio and scroll down to the blue Download RStudio Desktop button.
Click the green button to download RStudio Desktop Open Source License and select appropriate installer for your operating system.
Double click the installer and follow the prompts to set up RStudio.
To install R:
sudo apt-get update to update first
then
sudo apt-get install r-base to install R
On most distributions:
Download the .deb file and double click to open the package installer.
# Check R version
version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status
## major 3
## minor 4.4
## year 2018
## month 03
## day 15
## svn rev 74408
## language R
## version.string R version 3.4.4 (2018-03-15)
## nickname Someone to Lean On
todo: finish this section
Update R and R packages from this blogpost for Windows ??
# installing/loading the package:
if(!require(installr)) {
install.packages("installr"); require(installr)} #load / install+load installr
## Loading required package: installr
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'installr'
## Warning in install.packages :
## unable to access index for repository https://cloud.r-project.org/src/contrib:
## cannot open URL 'https://cloud.r-project.org/src/contrib/PACKAGES'
## Installing package into '/home/pandagrrl/R/x86_64-pc-linux-gnu-library/3.4'
## (as 'lib' is unspecified)
## Warning in install.packages :
## unable to access index for repository https://cloud.r-project.org/src/contrib:
## cannot open URL 'https://cloud.r-project.org/src/contrib/PACKAGES'
## Warning in install.packages :
## package 'installr' is not available (for R version 3.4.4)
## Loading required package: installr
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'installr'
# using the package:
# updateR() # this will start the updating process of your R installation. It will check for newer versions, and if one is available, will guide you through the decisions you'd need to make.
Working in an RStudio project has many benefits.
R-Ladies presenters gesticulate wildly at RStudio
Particularly useful panes:
Help-Cheatsheets-RStudio IDE Cheatsheet
RStudio projects > make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents.
A recommendation on how to organize your R project from Good Enough Practices for Scientific Computing as summarized by Software Carpentry
Now create an R Project which divides your work in - Open RStudio and create a project via File-New Project - Select New Directory and choose New Project - Name your project rcurious - Save the project directory wherever suits you
The console is where you can execute single-line R commands.
The console is located, by default, in the lower left pane.
Try 3 + 2 and run using ctrl-enter or the play button:
# Annotate with comments using the #. If you precede anything with this sign, R will ignore it.
3 + 2
## [1] 5
We can annotate a script with comments using the #. If you precede anything with this sign, R will ignore it.
In R, the fundamental unit of shareable code is an R package.
R packages are available from The Comprehensive R Archive Network(CRAN) or github.
We’re going to use the metapackage tidyverse available from CRAN to help us with our data analysis.
install.packages("arbitrarypkg")
library(arbitrarypkg)
I can store the number 5 in an object x.
To assign a value we use an arrow <-.
x <- 5
What happens when you type x into the Console after assigning the value 5 to it?
What do you see in the Environment pane?
(control + 8 to switch focus to Environment pane.)
x <- 5
x
## [1] 5
<<<<<<< HEAD ## Data structures in R
Data objects can be vectors of: - numbers - characters - logical
Or tables of data.
Today we will work with a tidy data structure.
The documentation for the R package tidyverse:: is available here and github
We’ll do this analysis in R markdown.
Open File-New File-R Markdown
This will open an Untitled1.Rmd template.
To open a code chunk in your .Rmd: control+alt+i
To knit to .html in the Viewer pane: save and control+shift+k
We’re going to use the metapackage tidyverse to help us with our data analysis.
The two important functions.
install.packages("arbitrarypkge")
library(arbitrarypkg)
tidyverseWe would like to install a package called “tidyverse”. Let’s try.
# I ran: install.packages("tidyverse") in the console the first time.
library(tidyverse)
# install.packages("tidyverse") # Run this to install.
library(tidyverse)
# Store a string in an object.
url <- <character string>
# Read data using a tidyverse function.
db <- read_csv(url)
# We can also write this function with the package name, ie the read_csv function from the readr package
db <- readr::read_csv(url)
[suggest taking this out as it is covered into rcurious-witcg-trials.csv]
Data Source
The github link in this import example is https://github.com/JakeRuss/witch-trials, first take a look at the link. Click on the data folder then trials.csv. In order to import the data we need a direct link to the data. You can find this with the button in github RAW. This is the url we want to use.
We will use a function called read_csv to load the data into a variable called witchdat
url <- "https://raw.githubusercontent.com/JakeRuss/witch-trials/master/data/trials.csv"
witchdat <- read_csv(url)
## Parsed with column specification:
## cols(
## year = col_integer(),
## decade = col_integer(),
## century = col_integer(),
## tried = col_integer(),
## deaths = col_integer(),
## city = col_character(),
## gadm.adm2 = col_character(),
## gadm.adm1 = col_character(),
## gadm.adm0 = col_character(),
## lon = col_double(),
## lat = col_double(),
## record.source = col_character()
## )
This data has been loaded into witchdat in R, take a look at the environment window.
todo: move this slide somewhere else, perhaps - sticking it here for now.
At some point you’ll need to use the command line.
To break ourselves in, let’s check what version of R we are running.
R --version
## R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
## Copyright (C) 2018 The R Foundation for Statistical Computing
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## R is free software and comes with ABSOLUTELY NO WARRANTY.
## You are welcome to redistribute it under the terms of the
## GNU General Public License versions 2 or 3.
## For more information about these matters see
## http://www.gnu.org/licenses/.